Giorgio Bakhiet Derias I3a, Bachelorarbeit
The aim of this notebook is to make an analysis of the sentiment about the different newspapers that can be read in Switzerland.
In order to work I first need to install the libraries from which I will then import what I need. I created a text file called requirementsNewspaper, in which I saved all the libraries I used. The usefulness of this file is when I move to a new environment, installing all packages at once by simply typing:
#%conda install --file requirementsNewspaper.txt
!python -m pip install --upgrade pip
Requirement already satisfied: pip in /home/mlmp/conda/lib/python3.8/site-packages (21.1.2)
!pip3 install ktrain
!pip3 install git+https://github.com/amaiya/eli5@tfkeras_0_10_1
!pip install plotly-express
!pip install jupyterlab "ipywidgets>=7.5"
Requirement already satisfied: ktrain in /home/mlmp/conda/lib/python3.8/site-packages (0.26.3) Requirement already satisfied: ipython in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (7.19.0) Requirement already satisfied: seqeval==0.0.19 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (0.0.19) Requirement already satisfied: syntok in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (1.3.1) Requirement already satisfied: langdetect in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (1.0.9) Requirement already satisfied: sentencepiece in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (0.1.95) Requirement already satisfied: packaging in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (20.4) Requirement already satisfied: joblib in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (0.17.0) Requirement already satisfied: whoosh in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (2.7.4) Requirement already satisfied: keras-bert>=0.86.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (0.86.0) Requirement already satisfied: transformers<=4.3.3,>=4.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (4.3.3) Requirement already satisfied: jieba in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (0.42.1) Requirement already satisfied: cchardet in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (2.1.7) Requirement already satisfied: networkx>=2.3 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (2.5) Requirement already satisfied: scikit-learn==0.23.2 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (0.23.2) Requirement already satisfied: requests in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (2.24.0) Requirement already satisfied: pandas>=1.0.1 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (1.1.3) Requirement already satisfied: fastprogress>=0.1.21 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (1.0.0) Requirement already satisfied: matplotlib>=3.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ktrain) (3.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from scikit-learn==0.23.2->ktrain) (2.1.0) Requirement already satisfied: scipy>=0.19.1 in /home/mlmp/conda/lib/python3.8/site-packages (from scikit-learn==0.23.2->ktrain) (1.5.2) Requirement already satisfied: numpy>=1.13.3 in /home/mlmp/conda/lib/python3.8/site-packages (from scikit-learn==0.23.2->ktrain) (1.19.2) Requirement already satisfied: Keras>=2.2.4 in /home/mlmp/conda/lib/python3.8/site-packages (from seqeval==0.0.19->ktrain) (2.4.3) Requirement already satisfied: pyyaml in /home/mlmp/conda/lib/python3.8/site-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (5.3.1) Requirement already satisfied: h5py in /home/mlmp/conda/lib/python3.8/site-packages (from Keras>=2.2.4->seqeval==0.0.19->ktrain) (2.10.0) Requirement already satisfied: keras-transformer>=0.38.0 in /home/mlmp/conda/lib/python3.8/site-packages (from keras-bert>=0.86.0->ktrain) (0.38.0) Requirement already satisfied: keras-multi-head>=0.27.0 in /home/mlmp/conda/lib/python3.8/site-packages (from keras-transformer>=0.38.0->keras-bert>=0.86.0->ktrain) (0.27.0) Requirement already satisfied: keras-pos-embd>=0.11.0 in /home/mlmp/conda/lib/python3.8/site-packages (from keras-transformer>=0.38.0->keras-bert>=0.86.0->ktrain) (0.11.0) Requirement already satisfied: keras-position-wise-feed-forward>=0.6.0 in /home/mlmp/conda/lib/python3.8/site-packages (from keras-transformer>=0.38.0->keras-bert>=0.86.0->ktrain) (0.6.0) Requirement already satisfied: keras-embed-sim>=0.8.0 in /home/mlmp/conda/lib/python3.8/site-packages (from keras-transformer>=0.38.0->keras-bert>=0.86.0->ktrain) (0.8.0) Requirement already satisfied: keras-layer-normalization>=0.14.0 in /home/mlmp/conda/lib/python3.8/site-packages (from keras-transformer>=0.38.0->keras-bert>=0.86.0->ktrain) (0.14.0) Requirement already satisfied: keras-self-attention==0.46.0 in /home/mlmp/conda/lib/python3.8/site-packages (from keras-multi-head>=0.27.0->keras-transformer>=0.38.0->keras-bert>=0.86.0->ktrain) (0.46.0) Requirement already satisfied: certifi>=2020.06.20 in /home/mlmp/conda/lib/python3.8/site-packages (from matplotlib>=3.0.0->ktrain) (2020.6.20) Requirement already satisfied: kiwisolver>=1.0.1 in /home/mlmp/conda/lib/python3.8/site-packages (from matplotlib>=3.0.0->ktrain) (1.3.0) Requirement already satisfied: pillow>=6.2.0 in /home/mlmp/conda/lib/python3.8/site-packages (from matplotlib>=3.0.0->ktrain) (8.0.1) Requirement already satisfied: python-dateutil>=2.1 in /home/mlmp/conda/lib/python3.8/site-packages (from matplotlib>=3.0.0->ktrain) (2.8.1) Requirement already satisfied: cycler>=0.10 in /home/mlmp/conda/lib/python3.8/site-packages (from matplotlib>=3.0.0->ktrain) (0.10.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/mlmp/conda/lib/python3.8/site-packages (from matplotlib>=3.0.0->ktrain) (2.4.7) Requirement already satisfied: six in /home/mlmp/conda/lib/python3.8/site-packages (from cycler>=0.10->matplotlib>=3.0.0->ktrain) (1.15.0) Requirement already satisfied: decorator>=4.3.0 in /home/mlmp/conda/lib/python3.8/site-packages (from networkx>=2.3->ktrain) (4.4.2) Requirement already satisfied: pytz>=2017.2 in /home/mlmp/conda/lib/python3.8/site-packages (from pandas>=1.0.1->ktrain) (2020.1) Requirement already satisfied: tqdm>=4.27 in /home/mlmp/conda/lib/python3.8/site-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (4.50.2) Requirement already satisfied: regex!=2019.12.17 in /home/mlmp/conda/lib/python3.8/site-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (2020.10.15) Requirement already satisfied: filelock in /home/mlmp/conda/lib/python3.8/site-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (3.0.12) Requirement already satisfied: sacremoses in /home/mlmp/conda/lib/python3.8/site-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (0.0.45) Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /home/mlmp/conda/lib/python3.8/site-packages (from transformers<=4.3.3,>=4.0.0->ktrain) (0.10.3) Requirement already satisfied: jedi>=0.10 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (0.17.1) Requirement already satisfied: pygments in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (2.7.2) Requirement already satisfied: pexpect>4.3 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (4.8.0) Requirement already satisfied: traitlets>=4.2 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (5.0.5) Requirement already satisfied: backcall in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (0.2.0) Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (3.0.8) Requirement already satisfied: setuptools>=18.5 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (50.3.1.post20201107) Requirement already satisfied: pickleshare in /home/mlmp/conda/lib/python3.8/site-packages (from ipython->ktrain) (0.7.5) Requirement already satisfied: parso<0.8.0,>=0.7.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jedi>=0.10->ipython->ktrain) (0.7.0) Requirement already satisfied: ptyprocess>=0.5 in /home/mlmp/conda/lib/python3.8/site-packages (from pexpect>4.3->ipython->ktrain) (0.6.0) Requirement already satisfied: wcwidth in /home/mlmp/conda/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->ktrain) (0.2.5) Requirement already satisfied: ipython-genutils in /home/mlmp/conda/lib/python3.8/site-packages (from traitlets>=4.2->ipython->ktrain) (0.2.0) Requirement already satisfied: chardet<4,>=3.0.2 in /home/mlmp/conda/lib/python3.8/site-packages (from requests->ktrain) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /home/mlmp/conda/lib/python3.8/site-packages (from requests->ktrain) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/mlmp/conda/lib/python3.8/site-packages (from requests->ktrain) (1.25.11) Requirement already satisfied: click in /home/mlmp/conda/lib/python3.8/site-packages (from sacremoses->transformers<=4.3.3,>=4.0.0->ktrain) (7.1.2) Collecting git+https://github.com/amaiya/eli5@tfkeras_0_10_1 Cloning https://github.com/amaiya/eli5 (to revision tfkeras_0_10_1) to /tmp/pip-req-build-7jeworze Running command git clone -q https://github.com/amaiya/eli5 /tmp/pip-req-build-7jeworze Running command git checkout -b tfkeras_0_10_1 --track origin/tfkeras_0_10_1 Switched to a new branch 'tfkeras_0_10_1' Branch 'tfkeras_0_10_1' set up to track remote branch 'tfkeras_0_10_1' from 'origin'. Requirement already satisfied: attrs>16.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (20.3.0) Requirement already satisfied: jinja2 in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (2.11.2) Requirement already satisfied: numpy>=1.9.0 in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (1.19.2) Requirement already satisfied: scipy in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (1.5.2) Requirement already satisfied: six in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (1.15.0) Requirement already satisfied: scikit-learn>=0.18 in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (0.23.2) Requirement already satisfied: graphviz in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (0.16) Requirement already satisfied: tabulate>=0.7.7 in /home/mlmp/conda/lib/python3.8/site-packages (from eli5==0.10.1) (0.8.9) Requirement already satisfied: joblib>=0.11 in /home/mlmp/conda/lib/python3.8/site-packages (from scikit-learn>=0.18->eli5==0.10.1) (0.17.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from scikit-learn>=0.18->eli5==0.10.1) (2.1.0) Requirement already satisfied: MarkupSafe>=0.23 in /home/mlmp/conda/lib/python3.8/site-packages (from jinja2->eli5==0.10.1) (1.1.1) Requirement already satisfied: plotly-express in /home/mlmp/conda/lib/python3.8/site-packages (0.4.1) Requirement already satisfied: patsy>=0.5 in /home/mlmp/conda/lib/python3.8/site-packages (from plotly-express) (0.5.1) Requirement already satisfied: pandas>=0.20.0 in /home/mlmp/conda/lib/python3.8/site-packages (from plotly-express) (1.1.3) Requirement already satisfied: scipy>=0.18 in /home/mlmp/conda/lib/python3.8/site-packages (from plotly-express) (1.5.2) Requirement already satisfied: numpy>=1.11 in /home/mlmp/conda/lib/python3.8/site-packages (from plotly-express) (1.19.2) Requirement already satisfied: statsmodels>=0.9.0 in /home/mlmp/conda/lib/python3.8/site-packages (from plotly-express) (0.12.0) Requirement already satisfied: plotly>=4.1.0 in /home/mlmp/conda/lib/python3.8/site-packages (from plotly-express) (4.14.3) Requirement already satisfied: pytz>=2017.2 in /home/mlmp/conda/lib/python3.8/site-packages (from pandas>=0.20.0->plotly-express) (2020.1) Requirement already satisfied: python-dateutil>=2.7.3 in /home/mlmp/conda/lib/python3.8/site-packages (from pandas>=0.20.0->plotly-express) (2.8.1) Requirement already satisfied: six in /home/mlmp/conda/lib/python3.8/site-packages (from patsy>=0.5->plotly-express) (1.15.0) Requirement already satisfied: retrying>=1.3.3 in /home/mlmp/conda/lib/python3.8/site-packages (from plotly>=4.1.0->plotly-express) (1.3.3) Requirement already satisfied: jupyterlab in /home/mlmp/conda/lib/python3.8/site-packages (2.2.6) Requirement already satisfied: ipywidgets>=7.5 in /home/mlmp/conda/lib/python3.8/site-packages (7.5.1) Requirement already satisfied: widgetsnbextension~=3.5.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ipywidgets>=7.5) (3.5.1) Requirement already satisfied: ipykernel>=4.5.1 in /home/mlmp/conda/lib/python3.8/site-packages (from ipywidgets>=7.5) (5.3.4) Requirement already satisfied: ipython>=4.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ipywidgets>=7.5) (7.19.0) Requirement already satisfied: nbformat>=4.2.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ipywidgets>=7.5) (5.0.8) Requirement already satisfied: traitlets>=4.3.1 in /home/mlmp/conda/lib/python3.8/site-packages (from ipywidgets>=7.5) (5.0.5) Requirement already satisfied: tornado>=4.2 in /home/mlmp/conda/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.5) (6.1) Requirement already satisfied: jupyter-client in /home/mlmp/conda/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.5) (6.1.7) Requirement already satisfied: backcall in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (0.2.0) Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (3.0.8) Requirement already satisfied: decorator in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (4.4.2) Requirement already satisfied: pygments in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (2.7.2) Requirement already satisfied: pickleshare in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (0.7.5) Requirement already satisfied: setuptools>=18.5 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (50.3.1.post20201107) Requirement already satisfied: jedi>=0.10 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (0.17.1) Requirement already satisfied: pexpect>4.3 in /home/mlmp/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets>=7.5) (4.8.0) Requirement already satisfied: parso<0.8.0,>=0.7.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jedi>=0.10->ipython>=4.0.0->ipywidgets>=7.5) (0.7.0) Requirement already satisfied: jupyter-core in /home/mlmp/conda/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets>=7.5) (4.6.3) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /home/mlmp/conda/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets>=7.5) (3.2.0) Requirement already satisfied: ipython-genutils in /home/mlmp/conda/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets>=7.5) (0.2.0) Requirement already satisfied: attrs>=17.4.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5) (20.3.0) Requirement already satisfied: six>=1.11.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5) (1.15.0) Requirement already satisfied: pyrsistent>=0.14.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5) (0.17.3) Requirement already satisfied: ptyprocess>=0.5 in /home/mlmp/conda/lib/python3.8/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets>=7.5) (0.6.0) Requirement already satisfied: wcwidth in /home/mlmp/conda/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets>=7.5) (0.2.5) Requirement already satisfied: notebook>=4.4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.5) (6.1.4) Requirement already satisfied: nbconvert in /home/mlmp/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (6.0.7) Requirement already satisfied: terminado>=0.8.3 in /home/mlmp/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.9.1) Requirement already satisfied: pyzmq>=17 in /home/mlmp/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (19.0.2) Requirement already satisfied: jinja2 in /home/mlmp/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (2.11.2) Requirement already satisfied: Send2Trash in /home/mlmp/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (1.5.0) Requirement already satisfied: prometheus-client in /home/mlmp/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.8.0) Requirement already satisfied: argon2-cffi in /home/mlmp/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (20.1.0) Requirement already satisfied: python-dateutil>=2.1 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets>=7.5) (2.8.1) Requirement already satisfied: jupyterlab_server<2.0,>=1.1.5 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyterlab) (1.2.0) Requirement already satisfied: MarkupSafe>=0.23 in /home/mlmp/conda/lib/python3.8/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (1.1.1) Requirement already satisfied: requests in /home/mlmp/conda/lib/python3.8/site-packages (from jupyterlab_server<2.0,>=1.1.5->jupyterlab) (2.24.0) Requirement already satisfied: json5 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyterlab_server<2.0,>=1.1.5->jupyterlab) (0.9.5) Requirement already satisfied: cffi>=1.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (1.14.3) Requirement already satisfied: pycparser in /home/mlmp/conda/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (2.20) Requirement already satisfied: jupyterlab-pygments in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.1.2) Requirement already satisfied: pandocfilters>=1.4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (1.4.3) Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.5.1) Requirement already satisfied: mistune<2,>=0.8.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.8.4) Requirement already satisfied: testpath in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.4.4) Requirement already satisfied: entrypoints>=0.2.2 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.3) Requirement already satisfied: defusedxml in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.6.0) Requirement already satisfied: bleach in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (3.2.1) Requirement already satisfied: nest-asyncio in /home/mlmp/conda/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (1.4.2) Requirement already satisfied: async-generator in /home/mlmp/conda/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (1.10) Requirement already satisfied: webencodings in /home/mlmp/conda/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (0.5.1) Requirement already satisfied: packaging in /home/mlmp/conda/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (20.4) Requirement already satisfied: pyparsing>=2.0.2 in /home/mlmp/conda/lib/python3.8/site-packages (from packaging->bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5) (2.4.7) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/mlmp/conda/lib/python3.8/site-packages (from requests->jupyterlab_server<2.0,>=1.1.5->jupyterlab) (1.25.11) Requirement already satisfied: idna<3,>=2.5 in /home/mlmp/conda/lib/python3.8/site-packages (from requests->jupyterlab_server<2.0,>=1.1.5->jupyterlab) (2.10) Requirement already satisfied: chardet<4,>=3.0.2 in /home/mlmp/conda/lib/python3.8/site-packages (from requests->jupyterlab_server<2.0,>=1.1.5->jupyterlab) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /home/mlmp/conda/lib/python3.8/site-packages (from requests->jupyterlab_server<2.0,>=1.1.5->jupyterlab) (2020.6.20)
!pip install voila
!pip install voila-gridstack
Requirement already satisfied: voila in /home/mlmp/conda/lib/python3.8/site-packages (0.2.10) Requirement already satisfied: nbconvert<7,>=6.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila) (6.0.7) Requirement already satisfied: nbclient<0.6,>=0.4.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila) (0.5.1) Requirement already satisfied: jupyter-server<2.0.0,>=0.3.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila) (1.8.0) Requirement already satisfied: jupyter-client<7,>=6.1.3 in /home/mlmp/conda/lib/python3.8/site-packages (from voila) (6.1.7) Requirement already satisfied: pyzmq>=13 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila) (19.0.2) Requirement already satisfied: python-dateutil>=2.1 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila) (2.8.1) Requirement already satisfied: traitlets in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila) (5.0.5) Requirement already satisfied: jupyter-core>=4.6.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila) (4.6.3) Requirement already satisfied: tornado>=4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila) (6.1) Requirement already satisfied: Send2Trash in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (1.5.0) Requirement already satisfied: argon2-cffi in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (20.1.0) Requirement already satisfied: prometheus-client in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (0.8.0) Requirement already satisfied: anyio<4,>=3.1.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (3.1.0) Requirement already satisfied: websocket-client in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (1.0.1) Requirement already satisfied: terminado>=0.8.3 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (0.9.1) Requirement already satisfied: nbformat in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (5.0.8) Requirement already satisfied: jinja2 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (2.11.2) Requirement already satisfied: ipython-genutils in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila) (0.2.0) Requirement already satisfied: sniffio>=1.1 in /home/mlmp/conda/lib/python3.8/site-packages (from anyio<4,>=3.1.0->jupyter-server<2.0.0,>=0.3.0->voila) (1.2.0) Requirement already satisfied: idna>=2.8 in /home/mlmp/conda/lib/python3.8/site-packages (from anyio<4,>=3.1.0->jupyter-server<2.0.0,>=0.3.0->voila) (2.10) Requirement already satisfied: nest-asyncio in /home/mlmp/conda/lib/python3.8/site-packages (from nbclient<0.6,>=0.4.0->voila) (1.4.2) Requirement already satisfied: async-generator in /home/mlmp/conda/lib/python3.8/site-packages (from nbclient<0.6,>=0.4.0->voila) (1.10) Requirement already satisfied: bleach in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (3.2.1) Requirement already satisfied: entrypoints>=0.2.2 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (0.3) Requirement already satisfied: pygments>=2.4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (2.7.2) Requirement already satisfied: mistune<2,>=0.8.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (0.8.4) Requirement already satisfied: jupyterlab-pygments in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (0.1.2) Requirement already satisfied: pandocfilters>=1.4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (1.4.3) Requirement already satisfied: defusedxml in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (0.6.0) Requirement already satisfied: testpath in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila) (0.4.4) Requirement already satisfied: MarkupSafe>=0.23 in /home/mlmp/conda/lib/python3.8/site-packages (from jinja2->jupyter-server<2.0.0,>=0.3.0->voila) (1.1.1) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /home/mlmp/conda/lib/python3.8/site-packages (from nbformat->jupyter-server<2.0.0,>=0.3.0->voila) (3.2.0) Requirement already satisfied: setuptools in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila) (50.3.1.post20201107) Requirement already satisfied: attrs>=17.4.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila) (20.3.0) Requirement already satisfied: six>=1.11.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila) (1.15.0) Requirement already satisfied: pyrsistent>=0.14.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila) (0.17.3) Requirement already satisfied: ptyprocess in /home/mlmp/conda/lib/python3.8/site-packages (from terminado>=0.8.3->jupyter-server<2.0.0,>=0.3.0->voila) (0.6.0) Requirement already satisfied: cffi>=1.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from argon2-cffi->jupyter-server<2.0.0,>=0.3.0->voila) (1.14.3) Requirement already satisfied: pycparser in /home/mlmp/conda/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->jupyter-server<2.0.0,>=0.3.0->voila) (2.20) Requirement already satisfied: packaging in /home/mlmp/conda/lib/python3.8/site-packages (from bleach->nbconvert<7,>=6.0.0->voila) (20.4) Requirement already satisfied: webencodings in /home/mlmp/conda/lib/python3.8/site-packages (from bleach->nbconvert<7,>=6.0.0->voila) (0.5.1) Requirement already satisfied: pyparsing>=2.0.2 in /home/mlmp/conda/lib/python3.8/site-packages (from packaging->bleach->nbconvert<7,>=6.0.0->voila) (2.4.7) Requirement already satisfied: voila-gridstack in /home/mlmp/conda/lib/python3.8/site-packages (0.2.0) Requirement already satisfied: voila<0.3.0,>=0.2.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila-gridstack) (0.2.10) Requirement already satisfied: jupyterlab-widgets~=1.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila-gridstack) (1.0.0) Requirement already satisfied: jupyter-client<7,>=6.1.3 in /home/mlmp/conda/lib/python3.8/site-packages (from voila<0.3.0,>=0.2.0->voila-gridstack) (6.1.7) Requirement already satisfied: nbclient<0.6,>=0.4.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila<0.3.0,>=0.2.0->voila-gridstack) (0.5.1) Requirement already satisfied: nbconvert<7,>=6.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila<0.3.0,>=0.2.0->voila-gridstack) (6.0.7) Requirement already satisfied: jupyter-server<2.0.0,>=0.3.0 in /home/mlmp/conda/lib/python3.8/site-packages (from voila<0.3.0,>=0.2.0->voila-gridstack) (1.8.0) Requirement already satisfied: tornado>=4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila<0.3.0,>=0.2.0->voila-gridstack) (6.1) Requirement already satisfied: jupyter-core>=4.6.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila<0.3.0,>=0.2.0->voila-gridstack) (4.6.3) Requirement already satisfied: pyzmq>=13 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila<0.3.0,>=0.2.0->voila-gridstack) (19.0.2) Requirement already satisfied: python-dateutil>=2.1 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila<0.3.0,>=0.2.0->voila-gridstack) (2.8.1) Requirement already satisfied: traitlets in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-client<7,>=6.1.3->voila<0.3.0,>=0.2.0->voila-gridstack) (5.0.5) Requirement already satisfied: nbformat in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (5.0.8) Requirement already satisfied: prometheus-client in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.8.0) Requirement already satisfied: ipython-genutils in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.2.0) Requirement already satisfied: anyio<4,>=3.1.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (3.1.0) Requirement already satisfied: jinja2 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (2.11.2) Requirement already satisfied: websocket-client in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.0.1) Requirement already satisfied: Send2Trash in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.5.0) Requirement already satisfied: terminado>=0.8.3 in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.9.1) Requirement already satisfied: argon2-cffi in /home/mlmp/conda/lib/python3.8/site-packages (from jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (20.1.0) Requirement already satisfied: sniffio>=1.1 in /home/mlmp/conda/lib/python3.8/site-packages (from anyio<4,>=3.1.0->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.2.0) Requirement already satisfied: idna>=2.8 in /home/mlmp/conda/lib/python3.8/site-packages (from anyio<4,>=3.1.0->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (2.10) Requirement already satisfied: nest-asyncio in /home/mlmp/conda/lib/python3.8/site-packages (from nbclient<0.6,>=0.4.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.4.2) Requirement already satisfied: async-generator in /home/mlmp/conda/lib/python3.8/site-packages (from nbclient<0.6,>=0.4.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.10) Requirement already satisfied: pygments>=2.4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (2.7.2) Requirement already satisfied: testpath in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.4.4) Requirement already satisfied: bleach in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (3.2.1) Requirement already satisfied: pandocfilters>=1.4.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.4.3) Requirement already satisfied: defusedxml in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.6.0) Requirement already satisfied: jupyterlab-pygments in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.1.2) Requirement already satisfied: mistune<2,>=0.8.1 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.8.4) Requirement already satisfied: entrypoints>=0.2.2 in /home/mlmp/conda/lib/python3.8/site-packages (from nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.3) Requirement already satisfied: MarkupSafe>=0.23 in /home/mlmp/conda/lib/python3.8/site-packages (from jinja2->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.1.1) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /home/mlmp/conda/lib/python3.8/site-packages (from nbformat->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (3.2.0) Requirement already satisfied: setuptools in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (50.3.1.post20201107) Requirement already satisfied: six>=1.11.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.15.0) Requirement already satisfied: pyrsistent>=0.14.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.17.3) Requirement already satisfied: attrs>=17.4.0 in /home/mlmp/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (20.3.0) Requirement already satisfied: ptyprocess in /home/mlmp/conda/lib/python3.8/site-packages (from terminado>=0.8.3->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.6.0) Requirement already satisfied: cffi>=1.0.0 in /home/mlmp/conda/lib/python3.8/site-packages (from argon2-cffi->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (1.14.3) Requirement already satisfied: pycparser in /home/mlmp/conda/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->jupyter-server<2.0.0,>=0.3.0->voila<0.3.0,>=0.2.0->voila-gridstack) (2.20) Requirement already satisfied: webencodings in /home/mlmp/conda/lib/python3.8/site-packages (from bleach->nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (0.5.1) Requirement already satisfied: packaging in /home/mlmp/conda/lib/python3.8/site-packages (from bleach->nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (20.4) Requirement already satisfied: pyparsing>=2.0.2 in /home/mlmp/conda/lib/python3.8/site-packages (from packaging->bleach->nbconvert<7,>=6.0.0->voila<0.3.0,>=0.2.0->voila-gridstack) (2.4.7)
!pip3 freeze > requirementsNewspaper.txt
# Numpy and Pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
import re
# Plotly
import plotly.express as px
from matplotlib import rc
import plotly.graph_objects as go
# KTrain
import ktrain
from ktrain import text
# Utily
import logging
from datetime import date
import time
from IPython.core.display import HTML
# change it for your system
path_predictor = './modelsave/bertDe_predictor_93'
# reload predictor
predictor = ktrain.load_predictor(path_predictor)
predictor.predict('Heute ist ein schöner Tag.')
'1'
Title with source at the end, without dot at end. = NEGATIVE
predictor.explain("Vitalik Buterin - Ethereum-Erfinder ist der jüngste Krypto-Milliardär der Welt - 20 Minuten")
y=0 (probability 0.632, score -0.540) top features
| Contribution? | Feature |
|---|---|
| +0.569 | <BIAS> |
| -0.029 | Highlighted in text (sum) |
vitalik buterin - ethereum-erfinder ist der jüngste krypto-milliardär der welt - 20 minuten
Title with source at the end, with dot at end. = Positive
predictor.explain("Vitalik Buterin - Ethereum-Erfinder ist der jüngste Krypto-Milliardär der Welt - 20 Minuten.")
y=1 (probability 0.617, score 0.476) top features
| Contribution? | Feature |
|---|---|
| +1.007 | Highlighted in text (sum) |
| -0.532 | <BIAS> |
vitalik buterin - ethereum-erfinder ist der jüngste krypto-milliardär der welt - 20 minuten.
Title without source, without dot = positive
predictor.explain("Vitalik Buterin - Ethereum-Erfinder ist der jüngste Krypto-Milliardär der Welt")
y=1 (probability 0.583, score 0.333) top features
| Contribution? | Feature |
|---|---|
| +1.003 | Highlighted in text (sum) |
| -0.670 | <BIAS> |
vitalik buterin - ethereum-erfinder ist der jüngste krypto-milliardär der welt
Title without source , with dot at end. = positive +
predictor.explain("Vitalik Buterin - Ethereum-Erfinder ist der jüngste Krypto-Milliardär der Welt.")
y=1 (probability 0.849, score 1.728) top features
| Contribution? | Feature |
|---|---|
| +2.295 | Highlighted in text (sum) |
| -0.567 | <BIAS> |
vitalik buterin - ethereum-erfinder ist der jüngste krypto-milliardär der welt.
Title+description, with - source = negative
predictor.explain("Vitalik Buterin - Ethereum-Erfinder ist der jüngste Krypto-Milliardär der Welt - 20 Minuten. Der Kurs der Kryptowährung Ethereum geht durch die Decke. Der 27-jährige Erfinder Vitalik Buterin ist darum zum Milliardär geworden.")
y=0 (probability 0.641, score -0.581) top features
| Contribution? | Feature |
|---|---|
| +0.350 | Highlighted in text (sum) |
| +0.230 | <BIAS> |
vitalik buterin - ethereum-erfinder ist der jüngste krypto-milliardär der welt - 20 minuten. der kurs der kryptowährung ethereum geht durch die decke. der 27-jährige erfinder vitalik buterin ist darum zum milliardär geworden.
Title+descritpion, without "- source" = positive
predictor.explain("Vitalik Buterin - Ethereum-Erfinder ist der jüngste Krypto-Milliardär der Welt. Der Kurs der Kryptowährung Ethereum geht durch die Decke. Der 27-jährige Erfinder Vitalik Buterin ist darum zum Milliardär geworden.")
y=1 (probability 0.578, score 0.314) top features
| Contribution? | Feature |
|---|---|
| +0.725 | Highlighted in text (sum) |
| -0.411 | <BIAS> |
vitalik buterin - ethereum-erfinder ist der jüngste krypto-milliardär der welt. der kurs der kryptowährung ethereum geht durch die decke. der 27-jährige erfinder vitalik buterin ist darum zum milliardär geworden.
predictor.explain("Postfinance und Swissquote bringen neue Banking-App «Yuh» - 20 Minuten")
y=0 (probability 0.619, score -0.483) top features
| Contribution? | Feature |
|---|---|
| +0.641 | <BIAS> |
| -0.157 | Highlighted in text (sum) |
postfinance und swissquote bringen neue banking-app «yuh» - 20 minuten
predictor.explain("Postfinance und Swissquote bringen neue Banking-App «Yuh» - 20 Minuten.")
y=1 (probability 0.544, score 0.175) top features
| Contribution? | Feature |
|---|---|
| +0.729 | Highlighted in text (sum) |
| -0.554 | <BIAS> |
postfinance und swissquote bringen neue banking-app «yuh» - 20 minuten.
predictor.explain("Postfinance und Swissquote bringen neue Banking-App «Yuh»")
y=0 (probability 0.599, score -0.403) top features
| Contribution? | Feature |
|---|---|
| +0.635 | <BIAS> |
| -0.232 | Highlighted in text (sum) |
postfinance und swissquote bringen neue banking-app «yuh»
predictor.explain("Postfinance und Swissquote bringen neue Banking-App «Yuh».")
y=1 (probability 0.652, score 0.626) top features
| Contribution? | Feature |
|---|---|
| +1.036 | Highlighted in text (sum) |
| -0.410 | <BIAS> |
postfinance und swissquote bringen neue banking-app «yuh».
The dot at the end of the sentence changes the meaning of the sentence!
predictor.explain("Postfinance und Swissquote bringen neue Banking-App «Yuh» - 20 Minuten. Der Markt mit Smartphone-Banken ist hart umkämpft. Nun kommt ein neuer Anbieter: Mit der App Yuh wollen Postfinance und Swissquote «neue Wege» gehen.")
y=1 (probability 0.797, score 1.366) top features
| Contribution? | Feature |
|---|---|
| +1.432 | Highlighted in text (sum) |
| -0.066 | <BIAS> |
postfinance und swissquote bringen neue banking-app «yuh» - 20 minuten. der markt mit smartphone-banken ist hart umkämpft. nun kommt ein neuer anbieter: mit der app yuh wollen postfinance und swissquote «neue wege» gehen.
predictor.explain("Postfinance und Swissquote bringen neue Banking-App «Yuh». Der Markt mit Smartphone-Banken ist hart umkämpft. Nun kommt ein neuer Anbieter: Mit der App Yuh wollen Postfinance und Swissquote «neue Wege» gehen.")
y=1 (probability 0.853, score 1.760) top features
| Contribution? | Feature |
|---|---|
| +1.779 | Highlighted in text (sum) |
| -0.019 | <BIAS> |
postfinance und swissquote bringen neue banking-app «yuh». der markt mit smartphone-banken ist hart umkämpft. nun kommt ein neuer anbieter: mit der app yuh wollen postfinance und swissquote «neue wege» gehen.
If I remove the source in the sentence I have better accuracy!
Adding the dot and removing the source improve accuracy
def noop(src):
return src
SOURCES = {
'20 Minuten': noop,
'Achgut.com': noop,
'Aargauer Zeitung': noop,
'Aargauerzeitung.ch': noop,
'Aponet.de': noop,
'aponet.de': noop,
'Augsburger Allgemeine': noop,
'aeroTELEGRAPH': noop,
'Aerotelegraph.com': noop,
'Bernerzeitung.ch': noop,
'BZ Berner Zeitung': noop,
'BLICK': noop,
'Blick.ch': noop,
'bluewin.ch': noop,
'Bluewin.ch': noop,
'BILD': noop,
'Bild': noop,
'BTC-ECHO': noop,
'Btc-echo.de': 'BTC-ECHO',
'B.Z. Berlin': noop,
'Businessinsider.de': noop,
'Business Insider Deutschland': noop,
'CHIP Online Deutschland': noop,
'CHIP Online':noop,
'Cryptoticker.io': noop,
'CryptoTicker.io - Bitcoin Kurs, Ethereum Kurs & Crypto News': noop,
'ComputerBase': noop,
'Cointelegraph Deutschland': noop,
'Cointelegraph': noop,
'DER AKTIONÄR': noop,
'DER SPIEGEL': noop,
'derStandard.at': noop,
'DocCheck News': noop,
'Die Achse des Guten': noop,
'DIE WELT': noop,
'Eurosport DE': noop,
'Focus': noop,
'FOCUS Online': noop,
'Frankfurter Rundschau': noop,
'Faz.net': noop,
'FAZ - Frankfurter Allgemeine Zeitung': noop,
'finews.ch': noop,
'Finews.ch': noop,
'futurezone.at': noop,
'Frankfurt-Live.com': noop,
'Frankfurt-live.com': noop,
'GMX.ch': noop,
'Www.gmx.ch': noop,
'Goldreporter.de': noop,
'Google News': noop,
'Herzeblog.de': noop,
'Heilpraxisnet.de': noop,
'Herisau24': noop,
'Herisau24.ch': noop,
'heise online': noop,
'kleinezeitung.at': noop,
'Krone.at': noop,
'IT Magazine': noop,
'Itmagazine.ch': noop,
'Luzerner Zeitung': noop,
'Motorsport-Total.com': noop,
'Motorsport-total.com': noop,
'Neue Zürcher Zeitung': noop,
'NDR.de': noop,
'n-tv NACHRICHTEN': noop,
'Nau.ch': noop,
'Www.nau.ch': noop,
'Oltner Tagblatt': noop,
'ÖKO-TEST': noop,
'PLUS 24': noop,
'Puls24.at': noop,
'Presseportal.de': noop,
'Schweizer Radio und Fernsehen (SRF)':noop,
'Www.srf.ch': 'Schweizer Radio und Fernsehen (SRF)',
'Salzburger Nachrichten': noop,
'Seniorweb Schweiz': noop,
'Seniorweb.ch': noop,
'SPEEDWEEK.COM': noop,
'Speedweek.com': noop,
'scinexx | Das Wissensmagazin': noop,
'St.Galler Tagblatt': noop,
'Spiegel Online': noop,
'Tagesanzeiger.ch': noop,
'Tageblatt-online': noop,
'Tagblatt.ch': noop,
'T3n': noop,
't3n – digital pioneers': noop,
't-online.de': noop,
'Telebasel': noop,
'Telebasel.ch': noop,
'VOX Online': noop,
'Www.vox.de': noop,
'WELT Nachrichtensender': noop,
'WirtschaftsWoche': noop,
'Wirtschafts Woche': noop,
'WELT': noop,
'watson': noop,
'Watson.ch': noop,
}
def cleanup_src(source):
if source not in SOURCES:
logging.warn("Unknown source %s, leaving as-is", source)
print("'"+source+"':"+ source.lower())
return SOURCES.get(source, noop)(source)
def cleanup_source(text):
text = text.lower()
# replacement
#text = text.replace("WWww","www")
text = text.replace("aargauerzeitung.ch", "aargauer zeitung")
text = text.replace("aerotelegraph.com","aerotelegraph")
text = text.replace('auto-motor-und-sport.de', 'auto motor und sport')
text = text.replace("bernerzeitung.ch", "bz berner zeitung")
text = text.replace("blick.ch","blick")
text = text.replace("btc-echo.de","btc-echo")
text = text.replace("btc-echo | bitcoin & blockchain pioneers", "btc-echo")
text = text.replace("businessinsider.de", "business insider deutschland")
text = text.replace("cointelegraph","cointelegraph deutschland")
text = text.replace("chip online deutschland","chip online")
text = text.replace("cryptoticker.io - bitcoin kurs, ethereum kurs & crypto news", "cryptoticker.io")
text = text.replace("doccheck.com","doccheck news")
text = text.replace("eurosport.de","eurosport de")
text = text.replace("focus","focus online")
text = text.replace("faz.net","faz - frankfurter allgemeine zeitung")
text = text.replace("www.fr.de","frankfurter rundschau")
text = text.replace("www.gmx.ch","gmx.ch")
text = text.replace("herisau24.ch", "herisau24")
text = text.replace("itmagazine.ch","it magazine")
text = text.replace('idee-fuer-mich.de', 'idee für mich')
text = text.replace("kleinezeitung.at","kleine zeitung")
text = text.replace('kurier.at', 'kurier')
text = text.replace("luzernerzeitung.ch","luzerner zeitung")
text = text.replace("www.nau.ch","nau.ch")
text = text.replace("n-tv.de","n-tv nachrichten")
text = text.replace("www.ndr.de","ndr.de")
text = text.replace("www.nzz.ch","neue zürcher zeitung")
text = text.replace("neueschweizerzeitung.ch","neue schweizer zeitung")
text = text.replace("oltnertagblatt.ch","oltner tagblatt")
text = text.replace("oekotest.de","öko-test")
text = text.replace("www.srf.ch","srf")
text = text.replace("schweizer radio und fernsehen (srf)","srf")
text = text.replace("spiegel online","der spiegel")
text = text.replace("www.sn.at","salzburger nachrichten")
text = text.replace("seniorweb.ch","seniorweb schweiz")
text = text.replace("t3n","t3n – digital pioneers")
text = text.replace("t-online.de","t-online")
text = text.replace("tageblatt.de","tageblatt-online")
text = text.replace("tagesanzeiger.ch","tages-anzeiger")
text = text.replace("telebasel.ch","telebasel")
text = text.replace("www.vox.de","vox online")
text = text.replace("watson.ch","watson")
return text
def clean_src(df):
# clean sources
source_apply = df.apply(
lambda row : cleanup_source(row['source']),
axis = 1
)
# reassign column source
df['source'] = source_apply
return df
# change it for your system
path_newspaper = "./database_newspaper/total/"
def openNewsPaper(s):
# import the csv file
df = pd.read_csv(path_newspaper+"total"+s+".csv",parse_dates=['publishedAt'], encoding='utf8', error_bad_lines=False, warn_bad_lines=True, header=0)
df = df[df['source'].notnull()]
df = df[df['title'].notnull()]
df = df[df['description'].notnull()]
df['source'] = df['source'].astype(str)
df['title'] = df['title'].astype(str)
df['description'] = df['description'].astype(str)
df['source'] = df['source'].str.lower()
df = clean_src(df)
# merge the columns title and description in a columnt 'content'
df['content'] = df[['title', 'description']].apply(lambda x: '. '.join(x), axis=1)
# create a new column target to save the predictions
df['sentiment'] = predictor.predict(df['content'].tolist())
df['sentiment'] = df['sentiment'].astype(str)
# resample the dataframe and return it
df_target = df[['source','content','category','publishedAt','sentiment']]
return df_target
news = openNewsPaper("_concat")
news['sentiment'].value_counts()
0 14825 1 10284 Name: sentiment, dtype: int64
len(news)
25109
news = news.drop_duplicates(subset=['content'], ignore_index=True)
len(news)
4486
news['sentiment'].value_counts()
0 2663 1 1823 Name: sentiment, dtype: int64
data = news.copy()
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4486 entries, 0 to 4485 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 source 4486 non-null object 1 content 4486 non-null object 2 category 4486 non-null object 3 publishedAt 4486 non-null datetime64[ns, UTC] 4 sentiment 4486 non-null object dtypes: datetime64[ns, UTC](1), object(4) memory usage: 175.4+ KB
data
| source | content | category | publishedAt | sentiment | |
|---|---|---|---|---|---|
| 0 | 20 minuten | Verletzung im Schädelinneren : Frau lief nach ... | world | 2021-05-01 19:01:07+00:00 | 0 |
| 1 | blick | USA: Freizeitpark wieder auf. 13 Monate lang w... | world | 2021-05-01 08:28:47+00:00 | 1 |
| 2 | 20 minuten | Verdacht auf Menschenschmuggel : US-Polizei fi... | world | 2021-05-01 00:37:51+00:00 | 0 |
| 3 | 20 minuten | Australien macht ernst : Bis zu fünf Jahre Gef... | world | 2021-04-30 20:31:47+00:00 | 0 |
| 4 | blick | Indonesien: Veronika Troshina droht Knast wege... | world | 2021-04-30 18:30:13+00:00 | 0 |
| ... | ... | ... | ... | ... | ... |
| 4481 | t-online | RKI-Zahlen in Deutschland: Bundesweite Sieben-... | science | 2021-05-14 03:18:41+00:00 | 0 |
| 4482 | bild | Thüringen: Wie ein Wirt ganz legal die Corona-... | science | 2021-05-13 18:14:05+00:00 | 0 |
| 4483 | augsburger allgemeine | Vorsicht: Ausgekugelte Schulter nie selbst beh... | health | 2021-05-13 14:36:53+00:00 | 0 |
| 4484 | www.rtl.de | Covid-19 ist doch keine Atemwegserkrankung - L... | health | 2021-05-12 17:19:00+00:00 | 1 |
| 4485 | k.at | ADHS bei Erwachsenen - "Im Kopf ist ständig Tu... | health | 2021-05-12 16:36:18+00:00 | 1 |
4486 rows × 5 columns
data['date_parsed'] = data['publishedAt'].dt.strftime('%Y-%m-%d')
data= data.drop(columns='publishedAt')
data
| source | content | category | sentiment | date_parsed | |
|---|---|---|---|---|---|
| 0 | 20 minuten | Verletzung im Schädelinneren : Frau lief nach ... | world | 0 | 2021-05-01 |
| 1 | blick | USA: Freizeitpark wieder auf. 13 Monate lang w... | world | 1 | 2021-05-01 |
| 2 | 20 minuten | Verdacht auf Menschenschmuggel : US-Polizei fi... | world | 0 | 2021-05-01 |
| 3 | 20 minuten | Australien macht ernst : Bis zu fünf Jahre Gef... | world | 0 | 2021-04-30 |
| 4 | blick | Indonesien: Veronika Troshina droht Knast wege... | world | 0 | 2021-04-30 |
| ... | ... | ... | ... | ... | ... |
| 4481 | t-online | RKI-Zahlen in Deutschland: Bundesweite Sieben-... | science | 0 | 2021-05-14 |
| 4482 | bild | Thüringen: Wie ein Wirt ganz legal die Corona-... | science | 0 | 2021-05-13 |
| 4483 | augsburger allgemeine | Vorsicht: Ausgekugelte Schulter nie selbst beh... | health | 0 | 2021-05-13 |
| 4484 | www.rtl.de | Covid-19 ist doch keine Atemwegserkrankung - L... | health | 1 | 2021-05-12 |
| 4485 | k.at | ADHS bei Erwachsenen - "Im Kopf ist ständig Tu... | health | 1 | 2021-05-12 |
4486 rows × 5 columns
news_concat = data.copy()
pd.set_option('display.max_colwidth', None)
news_concat
| source | content | category | sentiment | date_parsed | |
|---|---|---|---|---|---|
| 0 | 20 minuten | Verletzung im Schädelinneren : Frau lief nach Corona-Test Hirnwasser aus dem Kopf. In Osnabrück ist eine Frau beim Corona-Schnelltest im Inneren ihres Schädels verletzt worden. Danach lief ihr wochenlang Hirnwasser aus dem Kopf. | world | 0 | 2021-05-01 |
| 1 | blick | USA: Freizeitpark wieder auf. 13 Monate lang war Disneyland wegen der Corona-Pandemie stillgelegt, nun hat der beliebte Freizeitpark in Kalifornien wieder auf. | world | 1 | 2021-05-01 |
| 2 | 20 minuten | Verdacht auf Menschenschmuggel : US-Polizei findet 91 Menschen ohne Papiere in Wohnhaus. Auf Hinweis einer Entführung finden Polizeibeamte in Houston, im US-Bundesstaat Texas, 91 Frauen und Männer ohne gültige Aufenthaltspapiere. | world | 0 | 2021-05-01 |
| 3 | 20 minuten | Australien macht ernst : Bis zu fünf Jahre Gefängnis für Heimkehrer aus Hochrisikogebieten. Australien plant radikale Massnahmen für Personen, die illegal aus Corona-Hochrisikogebieten wie Indien einreisen: Ihnen könnte künftig bis zu fünf Jahren Gefängnis drohen. | world | 0 | 2021-04-30 |
| 4 | blick | Indonesien: Veronika Troshina droht Knast wegen Porno-Dreh auf Bali. Für den Dreh eines Amateur-Sexclips haben sich die Russin Veronika Troshina (22) und ihr Partner ausgerechnet einen heiligen Berg auf Bali ausgesucht. Dafür sucht sie nun die Polizei | world | 0 | 2021-04-30 |
| ... | ... | ... | ... | ... | ... |
| 4481 | t-online | RKI-Zahlen in Deutschland: Bundesweite Sieben-Tage-Inzidenz sinkt auf unter 100. Erstmals seit dem 20. März vermeldet das RKI eine Sieben-Tage-Inzidenz unter dem kritischen Schwellenwert. Auch die Zahl der gemeldeten Neuinfektionen liegt deutlich unter der Vorwoche. | science | 0 | 2021-05-14 |
| 4482 | bild | Thüringen: Wie ein Wirt ganz legal die Corona-Regeln umgeht. Gotha (Thüringen) – Allein in den letzten Tagen wurden über 600 Gäste bekocht – Trotz Notbremse und Inzidenz weit über 200. | science | 0 | 2021-05-13 |
| 4483 | augsburger allgemeine | Vorsicht: Ausgekugelte Schulter nie selbst behandeln. Eine ausgekugelte Schulter ist ausgesprochen schmerzhaft und das Einkugeln mitunter abenteuerlich. Damit alles wieder dahin kommt, wo es hingehört, hat der Arzt... | health | 0 | 2021-05-13 |
| 4484 | www.rtl.de | Covid-19 ist doch keine Atemwegserkrankung - Lauterbach: "wichtige Studie" - RTL Online. Eine neue Studie zeigt nun, dass die besonderen Spike-Proteine auch bei der durch das Coronavirus ausgelösten Covid-19-Erkrankung eine Schlüsselrolle spielen. | health | 1 | 2021-05-12 |
| 4485 | k.at | ADHS bei Erwachsenen - "Im Kopf ist ständig Turbo" - k.at. Lange Zeit galt ADHS als reine Kinderkrankheit. Doch ungefähr die Hälfte aller Betroffenen nimmt die Störung mit ins Erwachsenenalter. Christian Krohn ist einer von ihnen. | health | 1 | 2021-05-12 |
4486 rows × 5 columns
#export df
news_concat.to_csv('news_concat.csv')
Now that the data has been imported, predicted and cleaned I can start to analyse it, to do this I will use plolty. In order to display the data correctly I will first have to normalise it, I have written two functions for this purpose.
def normalize(df):
# copy the data
df_max_scal = df.copy()
# apply normalization techniques
for column in df_max_scal.columns:
df_max_scal['sentiment %'] = (df_max_scal['count'] / df_max_scal['count'].sum())*100
df_max_scal['sentiment %'] = df_max_scal['sentiment %'].round(decimals=2)
return df_max_scal
def norm(x):
x['sentiment %'] = (x['count'] /x['count'].sum())*100
x['sentiment %'] = x['sentiment %'].round(decimals=2)
return x
tot= news_concat.groupby(['sentiment']).size().reset_index()
tot['sentiment'] = tot['sentiment'].astype(str)
tot = tot.rename(columns={0:'count'})
tot = normalize(tot)
tot
| sentiment | count | sentiment % | |
|---|---|---|---|
| 0 | 0 | 2663 | 59.36 |
| 1 | 1 | 1823 | 40.64 |
figTotal = px.bar(tot,
x="sentiment",
y="sentiment %",
barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
labels={
"sentiment": "Sentiment",
"sentiment %": "# of articles (%)",
"sentiment": "Sentiment"
},
title="Total Positive vs Negative"
)
figTotal.show()
grouped= news_concat.groupby(['source','sentiment']).size().reset_index()
grouped['sentiment'] = grouped['sentiment'].astype(str)
grouped = grouped.rename(columns={0:'count'})
grouped = normalize(grouped)
grouped
| source | sentiment | count | sentiment % | |
|---|---|---|---|---|
| 0 | cash | 0 | 29 | 0.65 |
| 1 | cash | 1 | 21 | 0.47 |
| 2 | technik smartphone news | 0 | 1 | 0.02 |
| 3 | 11freunde.de | 0 | 1 | 0.02 |
| 4 | 20 minuten | 0 | 461 | 10.28 |
| ... | ... | ... | ... | ... |
| 392 | xboxdynasty | 1 | 1 | 0.02 |
| 393 | xboxdynasty.de | 1 | 1 | 0.02 |
| 394 | youtube | 0 | 1 | 0.02 |
| 395 | zofingertagblatt.ch | 0 | 1 | 0.02 |
| 396 | öko-test | 0 | 1 | 0.02 |
397 rows × 4 columns
figNews = px.bar(grouped,
x="source",
y="sentiment %",
text = "sentiment %",
barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
#facet_col='source', facet_col_wrap=4
#facet_row="targetTitle",
#facet_col="category",
)
figNews.update_layout(xaxis={'categoryorder':'total descending'})
figNews.show()
grouped.source.value_counts()
business insider deutschland 2
coincierge.de 2
mopo.de 2
bulgarisches wirtschaftsblatt 2
nzzas.nzz.ch 2
..
mobiflip.de 1
gmx.at 1
np-coburg.de 1
merkur online 1
ntower.de 1
Name: source, Length: 282, dtype: int64
# I take newspapers with more than tot articles, if a newspaper's category is missing I delete it.
clean = grouped.loc[grouped['count'] > 5]
# activate this only if you want a newspaper with both sentiment
#clean = clean[clean['source'].map(clean['source'].value_counts()) > 1]
clean = clean.groupby(['source']).apply(norm).reset_index(drop=True)
clean
| source | sentiment | count | sentiment % | |
|---|---|---|---|---|
| 0 | cash | 0 | 29 | 58.00 |
| 1 | cash | 1 | 21 | 42.00 |
| 2 | 20 minuten | 0 | 461 | 64.57 |
| 3 | 20 minuten | 1 | 253 | 35.43 |
| 4 | aargauer zeitung | 0 | 12 | 66.67 |
| ... | ... | ... | ... | ... |
| 88 | watson | 0 | 52 | 44.07 |
| 89 | watson | 1 | 66 | 55.93 |
| 90 | wirtschaftsblatt-bg.com | 0 | 6 | 100.00 |
| 91 | www.rtl.de | 0 | 8 | 57.14 |
| 92 | www.rtl.de | 1 | 6 | 42.86 |
93 rows × 4 columns
clean2 = clean.sort_values(by=['count'], ascending=False)
display(clean2.head(35))
| source | sentiment | count | sentiment % | |
|---|---|---|---|---|
| 13 | blick | 0 | 509 | 63.62 |
| 2 | 20 minuten | 0 | 461 | 64.57 |
| 14 | blick | 1 | 291 | 36.38 |
| 3 | 20 minuten | 1 | 253 | 35.43 |
| 80 | srf | 1 | 207 | 50.12 |
| 79 | srf | 0 | 206 | 49.88 |
| 65 | nau.ch | 0 | 100 | 60.61 |
| 86 | telebasel | 0 | 73 | 57.94 |
| 48 | heilpraxisnet.de | 0 | 70 | 70.00 |
| 89 | watson | 1 | 66 | 55.93 |
| 84 | tages-anzeiger | 0 | 65 | 69.15 |
| 66 | nau.ch | 1 | 65 | 39.39 |
| 81 | t-online | 0 | 64 | 75.29 |
| 16 | bluewin.ch | 1 | 60 | 52.17 |
| 15 | bluewin.ch | 0 | 55 | 47.83 |
| 87 | telebasel | 1 | 53 | 42.06 |
| 68 | neue zürcher zeitung | 0 | 52 | 71.23 |
| 88 | watson | 0 | 52 | 44.07 |
| 11 | bild | 0 | 51 | 80.95 |
| 8 | augsburger allgemeine | 0 | 45 | 60.81 |
| 77 | speedweek.com | 0 | 35 | 57.38 |
| 33 | faz - frankfurter allgemeine zeitung | 0 | 35 | 70.00 |
| 45 | gmx.ch | 0 | 34 | 75.56 |
| 26 | der spiegel | 0 | 33 | 67.35 |
| 18 | btc-echo | 1 | 30 | 50.85 |
| 49 | heilpraxisnet.de | 1 | 30 | 30.00 |
| 0 | cash | 0 | 29 | 58.00 |
| 85 | tages-anzeiger | 1 | 29 | 30.85 |
| 9 | augsburger allgemeine | 1 | 29 | 39.19 |
| 17 | btc-echo | 0 | 29 | 49.15 |
| 37 | focus online | 0 | 27 | 57.45 |
| 78 | speedweek.com | 1 | 26 | 42.62 |
| 75 | scinexx | das wissensmagazin | 0 | 23 | 56.10 |
| 35 | finews.ch | 0 | 22 | 50.00 |
| 36 | finews.ch | 1 | 22 | 50.00 |
figNews = px.bar(clean,
x="source",
y="sentiment %",
text = "sentiment %",
barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
labels={
"source": "Sources",
"sentiment %": "# of articles (%)",
"sentiment": "Sentiment"
},
title="Newspaper Positive vs Negative"
)
figNews.show()
# plot newspaper best positive 10
gr10Pos = clean.loc[clean['sentiment'] == '1']
gr10Pos = gr10Pos.loc[gr10Pos['count'] > 15]
gr10Pos = gr10Pos.sort_values(by=['sentiment %'], ascending=False).reset_index(drop=True)
gr10Pos = gr10Pos.head(10)
gr10Pos
| source | sentiment | count | sentiment % | |
|---|---|---|---|---|
| 0 | auto motor und sport | 1 | 19 | 100.00 |
| 1 | schweizer-illustrierte.ch | 1 | 17 | 100.00 |
| 2 | watson | 1 | 66 | 55.93 |
| 3 | aerotelegraph | 1 | 21 | 55.26 |
| 4 | chip online | 1 | 17 | 54.84 |
| 5 | futurezone.at | 1 | 18 | 52.94 |
| 6 | bluewin.ch | 1 | 60 | 52.17 |
| 7 | btc-echo | 1 | 30 | 50.85 |
| 8 | srf | 1 | 207 | 50.12 |
| 9 | finews.ch | 1 | 22 | 50.00 |
# plot newspaper best negative 10
gr10Neg = clean.loc[clean['sentiment'] == '0']
gr10Neg = gr10Neg.loc[gr10Neg['count'] > 15]
gr10Neg = gr10Neg.sort_values(by=['sentiment %'], ascending=False).reset_index(drop=True)
gr10Neg = gr10Neg.head(10)
gr10Neg
| source | sentiment | count | sentiment % | |
|---|---|---|---|---|
| 0 | heidelberg24.de | 0 | 16 | 100.00 |
| 1 | tagblatt.ch | 0 | 19 | 100.00 |
| 2 | bild | 0 | 51 | 80.95 |
| 3 | gmx.ch | 0 | 34 | 75.56 |
| 4 | t-online | 0 | 64 | 75.29 |
| 5 | neue zürcher zeitung | 0 | 52 | 71.23 |
| 6 | faz - frankfurter allgemeine zeitung | 0 | 35 | 70.00 |
| 7 | heilpraxisnet.de | 0 | 70 | 70.00 |
| 8 | tages-anzeiger | 0 | 65 | 69.15 |
| 9 | der spiegel | 0 | 33 | 67.35 |
figNews10 = px.bar(gr10Pos,
x="source",
y="sentiment %",
text = "sentiment %",
barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
labels={
"source": "Sources",
"sentiment %": "# of articles (%)",
"sentiment": "Sentiment"
},
title="Top 10 Positive Newspaper"
)
figNews10.show()
figNews10 = px.bar(gr10Neg,
x="source",
y="sentiment %",
text = "sentiment %",
barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
labels={
"source": "Sources",
"sentiment %": "# of articles (%)",
"sentiment": "Sentiment"
},
title="Top 10 Negative Newspaper"
)
figNews10.show()
category = news_concat.groupby(['category','sentiment']).size().reset_index()
category['sentiment'] = category['sentiment'].astype(str)
category = category.rename(columns={0:'count'})
#normalize on the dataset
category = normalize(category)
category
| category | sentiment | count | sentiment % | |
|---|---|---|---|---|
| 0 | business | 0 | 185 | 4.12 |
| 1 | business | 1 | 190 | 4.24 |
| 2 | entertainment | 0 | 279 | 6.22 |
| 3 | entertainment | 1 | 227 | 5.06 |
| 4 | health | 0 | 360 | 8.02 |
| 5 | health | 1 | 180 | 4.01 |
| 6 | nation | 0 | 235 | 5.24 |
| 7 | nation | 1 | 64 | 1.43 |
| 8 | science | 0 | 396 | 8.83 |
| 9 | science | 1 | 244 | 5.44 |
| 10 | sport | 0 | 330 | 7.36 |
| 11 | sport | 1 | 345 | 7.69 |
| 12 | technology | 0 | 155 | 3.46 |
| 13 | technology | 1 | 138 | 3.08 |
| 14 | world | 0 | 723 | 16.12 |
| 15 | world | 1 | 435 | 9.70 |
# normalize single category
category_clean = category.groupby(['category']).apply(norm).reset_index(drop=True)
category_clean
| category | sentiment | count | sentiment % | |
|---|---|---|---|---|
| 0 | business | 0 | 185 | 49.33 |
| 1 | business | 1 | 190 | 50.67 |
| 2 | entertainment | 0 | 279 | 55.14 |
| 3 | entertainment | 1 | 227 | 44.86 |
| 4 | health | 0 | 360 | 66.67 |
| 5 | health | 1 | 180 | 33.33 |
| 6 | nation | 0 | 235 | 78.60 |
| 7 | nation | 1 | 64 | 21.40 |
| 8 | science | 0 | 396 | 61.88 |
| 9 | science | 1 | 244 | 38.12 |
| 10 | sport | 0 | 330 | 48.89 |
| 11 | sport | 1 | 345 | 51.11 |
| 12 | technology | 0 | 155 | 52.90 |
| 13 | technology | 1 | 138 | 47.10 |
| 14 | world | 0 | 723 | 62.44 |
| 15 | world | 1 | 435 | 37.56 |
figCat = px.bar(category_clean,
x="category",
y="sentiment %",
text = "sentiment %",
barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
labels={
"category": "Category",
"sentiment %": "# of articles (%)",
"sentiment": "Sentiment"
},
title="Positive vs Negative Category"
)
figCat.show()
I will only consider the three largest newspapers by number of article
newspaper_source = [
'20 minuten',
'blick',
'srf',
]
news_small = news_concat[news_concat.source.isin(newspaper_source)]
sourceCat = news_small.groupby(['source','category','sentiment']).size().reset_index()
sourceCat['sentiment'] = sourceCat['sentiment'].astype(str)
sourceCat = sourceCat.rename(columns={0:'count'})
# normalized only on category
sourceCat = sourceCat.groupby(['category']).apply(norm).reset_index(drop=True)
sourceCat
| source | category | sentiment | count | sentiment % | |
|---|---|---|---|---|---|
| 0 | 20 minuten | business | 0 | 31 | 20.95 |
| 1 | 20 minuten | business | 1 | 33 | 22.30 |
| 2 | 20 minuten | entertainment | 0 | 56 | 22.40 |
| 3 | 20 minuten | entertainment | 1 | 65 | 26.00 |
| 4 | 20 minuten | health | 0 | 2 | 100.00 |
| 5 | 20 minuten | nation | 0 | 106 | 40.30 |
| 6 | 20 minuten | nation | 1 | 21 | 7.98 |
| 7 | 20 minuten | science | 0 | 10 | 20.41 |
| 8 | 20 minuten | science | 1 | 10 | 20.41 |
| 9 | 20 minuten | sport | 0 | 41 | 10.73 |
| 10 | 20 minuten | sport | 1 | 22 | 5.76 |
| 11 | 20 minuten | technology | 0 | 13 | 22.41 |
| 12 | 20 minuten | technology | 1 | 12 | 20.69 |
| 13 | 20 minuten | world | 0 | 202 | 26.06 |
| 14 | 20 minuten | world | 1 | 90 | 11.61 |
| 15 | blick | business | 0 | 28 | 18.92 |
| 16 | blick | business | 1 | 41 | 27.70 |
| 17 | blick | entertainment | 0 | 72 | 28.80 |
| 18 | blick | entertainment | 1 | 45 | 18.00 |
| 19 | blick | nation | 0 | 86 | 32.70 |
| 20 | blick | nation | 1 | 27 | 10.27 |
| 21 | blick | science | 0 | 4 | 8.16 |
| 22 | blick | science | 1 | 8 | 16.33 |
| 23 | blick | sport | 0 | 109 | 28.53 |
| 24 | blick | sport | 1 | 84 | 21.99 |
| 25 | blick | technology | 0 | 12 | 20.69 |
| 26 | blick | technology | 1 | 7 | 12.07 |
| 27 | blick | world | 0 | 198 | 25.55 |
| 28 | blick | world | 1 | 79 | 10.19 |
| 29 | srf | business | 0 | 10 | 6.76 |
| 30 | srf | business | 1 | 5 | 3.38 |
| 31 | srf | entertainment | 0 | 5 | 2.00 |
| 32 | srf | entertainment | 1 | 7 | 2.80 |
| 33 | srf | nation | 0 | 16 | 6.08 |
| 34 | srf | nation | 1 | 7 | 2.66 |
| 35 | srf | science | 0 | 9 | 18.37 |
| 36 | srf | science | 1 | 8 | 16.33 |
| 37 | srf | sport | 0 | 40 | 10.47 |
| 38 | srf | sport | 1 | 86 | 22.51 |
| 39 | srf | technology | 0 | 10 | 17.24 |
| 40 | srf | technology | 1 | 4 | 6.90 |
| 41 | srf | world | 0 | 116 | 14.97 |
| 42 | srf | world | 1 | 90 | 11.61 |
# the right normalization
sourceClean = sourceCat.groupby(['source','category']).apply(norm).reset_index(drop=True)
sourceClean
| source | category | sentiment | count | sentiment % | |
|---|---|---|---|---|---|
| 0 | 20 minuten | business | 0 | 31 | 48.44 |
| 1 | 20 minuten | business | 1 | 33 | 51.56 |
| 2 | 20 minuten | entertainment | 0 | 56 | 46.28 |
| 3 | 20 minuten | entertainment | 1 | 65 | 53.72 |
| 4 | 20 minuten | health | 0 | 2 | 100.00 |
| 5 | 20 minuten | nation | 0 | 106 | 83.46 |
| 6 | 20 minuten | nation | 1 | 21 | 16.54 |
| 7 | 20 minuten | science | 0 | 10 | 50.00 |
| 8 | 20 minuten | science | 1 | 10 | 50.00 |
| 9 | 20 minuten | sport | 0 | 41 | 65.08 |
| 10 | 20 minuten | sport | 1 | 22 | 34.92 |
| 11 | 20 minuten | technology | 0 | 13 | 52.00 |
| 12 | 20 minuten | technology | 1 | 12 | 48.00 |
| 13 | 20 minuten | world | 0 | 202 | 69.18 |
| 14 | 20 minuten | world | 1 | 90 | 30.82 |
| 15 | blick | business | 0 | 28 | 40.58 |
| 16 | blick | business | 1 | 41 | 59.42 |
| 17 | blick | entertainment | 0 | 72 | 61.54 |
| 18 | blick | entertainment | 1 | 45 | 38.46 |
| 19 | blick | nation | 0 | 86 | 76.11 |
| 20 | blick | nation | 1 | 27 | 23.89 |
| 21 | blick | science | 0 | 4 | 33.33 |
| 22 | blick | science | 1 | 8 | 66.67 |
| 23 | blick | sport | 0 | 109 | 56.48 |
| 24 | blick | sport | 1 | 84 | 43.52 |
| 25 | blick | technology | 0 | 12 | 63.16 |
| 26 | blick | technology | 1 | 7 | 36.84 |
| 27 | blick | world | 0 | 198 | 71.48 |
| 28 | blick | world | 1 | 79 | 28.52 |
| 29 | srf | business | 0 | 10 | 66.67 |
| 30 | srf | business | 1 | 5 | 33.33 |
| 31 | srf | entertainment | 0 | 5 | 41.67 |
| 32 | srf | entertainment | 1 | 7 | 58.33 |
| 33 | srf | nation | 0 | 16 | 69.57 |
| 34 | srf | nation | 1 | 7 | 30.43 |
| 35 | srf | science | 0 | 9 | 52.94 |
| 36 | srf | science | 1 | 8 | 47.06 |
| 37 | srf | sport | 0 | 40 | 31.75 |
| 38 | srf | sport | 1 | 86 | 68.25 |
| 39 | srf | technology | 0 | 10 | 71.43 |
| 40 | srf | technology | 1 | 4 | 28.57 |
| 41 | srf | world | 0 | 116 | 56.31 |
| 42 | srf | world | 1 | 90 | 43.69 |
figCat = px.bar(sourceClean,
x="source",
y="sentiment %",
text = "sentiment %",
#barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
#facet_row='sentiment',
facet_col="category",
#facet_col_wrap=4
#facet_row="targetTitle",
#facet_col="category",
labels={
"source": "Sources",
"sentiment %": "# of articles (%)",
"sentiment": "Sentiment"
},
title="Top 3 in CH"
)
figCat.show()
For visualisation I create a minus value for "negative" values
sourceClean2 = sourceClean.copy()
sourceClean2['sentiment %'] = sourceClean2['sentiment %'] * (2 * sourceClean2['sentiment'].astype(int) - 1)
sourceClean2
| source | category | sentiment | count | sentiment % | |
|---|---|---|---|---|---|
| 0 | 20 minuten | business | 0 | 31 | -48.44 |
| 1 | 20 minuten | business | 1 | 33 | 51.56 |
| 2 | 20 minuten | entertainment | 0 | 56 | -46.28 |
| 3 | 20 minuten | entertainment | 1 | 65 | 53.72 |
| 4 | 20 minuten | health | 0 | 2 | -100.00 |
| 5 | 20 minuten | nation | 0 | 106 | -83.46 |
| 6 | 20 minuten | nation | 1 | 21 | 16.54 |
| 7 | 20 minuten | science | 0 | 10 | -50.00 |
| 8 | 20 minuten | science | 1 | 10 | 50.00 |
| 9 | 20 minuten | sport | 0 | 41 | -65.08 |
| 10 | 20 minuten | sport | 1 | 22 | 34.92 |
| 11 | 20 minuten | technology | 0 | 13 | -52.00 |
| 12 | 20 minuten | technology | 1 | 12 | 48.00 |
| 13 | 20 minuten | world | 0 | 202 | -69.18 |
| 14 | 20 minuten | world | 1 | 90 | 30.82 |
| 15 | blick | business | 0 | 28 | -40.58 |
| 16 | blick | business | 1 | 41 | 59.42 |
| 17 | blick | entertainment | 0 | 72 | -61.54 |
| 18 | blick | entertainment | 1 | 45 | 38.46 |
| 19 | blick | nation | 0 | 86 | -76.11 |
| 20 | blick | nation | 1 | 27 | 23.89 |
| 21 | blick | science | 0 | 4 | -33.33 |
| 22 | blick | science | 1 | 8 | 66.67 |
| 23 | blick | sport | 0 | 109 | -56.48 |
| 24 | blick | sport | 1 | 84 | 43.52 |
| 25 | blick | technology | 0 | 12 | -63.16 |
| 26 | blick | technology | 1 | 7 | 36.84 |
| 27 | blick | world | 0 | 198 | -71.48 |
| 28 | blick | world | 1 | 79 | 28.52 |
| 29 | srf | business | 0 | 10 | -66.67 |
| 30 | srf | business | 1 | 5 | 33.33 |
| 31 | srf | entertainment | 0 | 5 | -41.67 |
| 32 | srf | entertainment | 1 | 7 | 58.33 |
| 33 | srf | nation | 0 | 16 | -69.57 |
| 34 | srf | nation | 1 | 7 | 30.43 |
| 35 | srf | science | 0 | 9 | -52.94 |
| 36 | srf | science | 1 | 8 | 47.06 |
| 37 | srf | sport | 0 | 40 | -31.75 |
| 38 | srf | sport | 1 | 86 | 68.25 |
| 39 | srf | technology | 0 | 10 | -71.43 |
| 40 | srf | technology | 1 | 4 | 28.57 |
| 41 | srf | world | 0 | 116 | -56.31 |
| 42 | srf | world | 1 | 90 | 43.69 |
figCat = px.bar(sourceClean2,
x="source",
y="sentiment %",
text = "sentiment %",
#barmode="group",
color="sentiment",
color_discrete_map={
'0': '#ef553b',
'1': '#00cc96'
},
#facet_row='sentiment',
facet_col="category",
#facet_col_wrap=4
#facet_row="targetTitle",
#facet_col="category",
labels={
"source": "Sources",
"sentiment %": "# of articles (%)",
"sentiment": "Sentiment"
},
title="Top 3 in CH Optimized"
)
figCat.show()
sourceCat = news_concat.groupby(['source','category','sentiment']).size().reset_index()
sourceCat['sentiment'] = sourceCat['sentiment'].astype(str)
sourceCat = sourceCat.rename(columns={0:'count'})
sourceCat = sourceCat.groupby(['category']).apply(norm).reset_index(drop=True)
sourceCat
| source | category | sentiment | count | sentiment % | |
|---|---|---|---|---|---|
| 0 | cash | science | 0 | 1 | 0.16 |
| 1 | cash | technology | 0 | 28 | 9.56 |
| 2 | cash | technology | 1 | 21 | 7.17 |
| 3 | technik smartphone news | health | 0 | 1 | 0.19 |
| 4 | 11freunde.de | sport | 0 | 1 | 0.15 |
| ... | ... | ... | ... | ... | ... |
| 607 | xboxdynasty | technology | 1 | 1 | 0.34 |
| 608 | xboxdynasty.de | world | 1 | 1 | 0.09 |
| 609 | youtube | science | 0 | 1 | 0.16 |
| 610 | zofingertagblatt.ch | health | 0 | 1 | 0.19 |
| 611 | öko-test | health | 0 | 1 | 0.19 |
612 rows × 5 columns
def plot_spider(df_name):
d_name = str(df_name)
df_name = sourceCat.loc[sourceCat['source'] == d_name ]
df_name = df_name.groupby(['category']).apply(norm).reset_index(drop=True)
df_name_pos = df_name.loc[df_name['sentiment'] == '1']
df_name_neg = df_name.loc[df_name['sentiment'] == '0']
label_neg = d_name + " NEG %"
label_pos = d_name +" POS %"
fig = go.Figure()
fig.add_trace(go.Scatterpolar(
r=df_name_neg['sentiment %'],
theta=df_name_neg['category'],
fill='toself',
mode = 'markers',
name= label_neg,
line_color = '#ef553b'
))
fig.add_trace(go.Scatterpolar(
r=df_name_pos['sentiment %'],
theta=df_name_pos['category'],
fill='toself',
mode = 'markers',
name= label_pos ,
line_color = '#00cc96'
))
fig.update_layout(
title = 'Spider Comparison: '+ d_name,
showlegend = True
)
fig.show()
list_newspaper = [
'20 minuten',
'blick',
'bluewin.ch',
'finews.ch',
'nau.ch',
'neue zürcher zeitung',
'srf',
'telebasel',
'tages-anzeiger',
'watson'
]
for x in list_newspaper:
plot_spider(x)
def create_df_time(df_filter, subject):
df = df_filter.groupby(['date_parsed',subject,'sentiment']).size().reset_index()
df['sentiment'] = df['sentiment'].astype(str)
df = df.rename(columns={0:'count'})
df = normalize(df)
df_pos= df.loc[df['sentiment'] == '1']
df_neg = df.loc[df['sentiment'] == '0']
# how much influence a newspaper had(pos neg) in % per day
df_pos = df_pos.groupby(['date_parsed']).apply(norm).reset_index(drop=True)
df_neg = df_neg.groupby(['date_parsed']).apply(norm).reset_index(drop=True)
df_pos[subject] = df_pos[subject].astype(str) + '_pos'
df_neg[subject] = df_neg[subject].astype(str) + '_neg'
df_concat = pd.concat([df_pos,df_neg],ignore_index=True)
# pivot table
test = df_concat.pivot(index=subject,columns='date_parsed', values='sentiment %')
test = test.fillna(0)
test = test.reset_index()
return test
newspaper_source = [
'20 minuten',
'blick',
#'bluewin.ch',
#'finews.ch',
#'nau.ch',
#'neue zürcher zeitung',
'srf',
#'telebasel',
#'tages-anzeiger',
#'watson'
]
news_small = news_concat[news_concat.source.isin(newspaper_source)]
# Filter data between two dates
filtered_df = news_small.loc[(news_small['date_parsed'] >= '2021-05-01') & (news_small['date_parsed']<= '2021-05-31')]
filtered_df
| source | content | category | sentiment | date_parsed | |
|---|---|---|---|---|---|
| 0 | 20 minuten | Verletzung im Schädelinneren : Frau lief nach Corona-Test Hirnwasser aus dem Kopf. In Osnabrück ist eine Frau beim Corona-Schnelltest im Inneren ihres Schädels verletzt worden. Danach lief ihr wochenlang Hirnwasser aus dem Kopf. | world | 0 | 2021-05-01 |
| 1 | blick | USA: Freizeitpark wieder auf. 13 Monate lang war Disneyland wegen der Corona-Pandemie stillgelegt, nun hat der beliebte Freizeitpark in Kalifornien wieder auf. | world | 1 | 2021-05-01 |
| 2 | 20 minuten | Verdacht auf Menschenschmuggel : US-Polizei findet 91 Menschen ohne Papiere in Wohnhaus. Auf Hinweis einer Entführung finden Polizeibeamte in Houston, im US-Bundesstaat Texas, 91 Frauen und Männer ohne gültige Aufenthaltspapiere. | world | 0 | 2021-05-01 |
| 10 | blick | Ironman: Ryf läuft mit Streckenrekord zu Sieg in St. George. In St. George feiert Daniela Ryf ihren zweiten Saisonsieg in einem 70.3-Ironman. Ein gutes Omen für die Mitteldistanz-WM, die im September auf der gleichen Strecke durchgeführt wird. | world | 1 | 2021-05-01 |
| 12 | srf | 32. Runde der Super League - Luzern verschafft sich weiter Luft im Abstiegskampf. Der FC Luzern gewinnt bei Vaduz 2:1 und baut den Vorsprung auf den Barrageplatz auf 9 Punkte aus. | world | 1 | 2021-05-01 |
| ... | ... | ... | ... | ... | ... |
| 4475 | blick | Premier League: Liverpool gewinnt Nachholspiel gegen ManUtd. Liverpool holt sich im Nachholspiel gegen Manchester United einen wichtigen Sieg. Vor der Partie kommts allerdings erneut zu Fan-Protesten. | sport | 1 | 2021-05-13 |
| 4476 | blick | Nach Fotos von Kobe Bryants Absturz - zwei Feuerwehrmänner gefeuert. Sie waren bei Kobe Bryants (†41) Helikopterabsturz im Einsatz: Zwei Feuerwehrleute in Los Angeles verlieren ihren Job, weil sie Fotos von der Unfallstelle gemacht haben. | sport | 0 | 2021-05-13 |
| 4477 | blick | Radsport: Gino Mäder gewinnt Bergankunft am Giro. Drei Kilometer vor dem Ziel lässt der Schweizer Gino Mäder (24, Bahrain Victorious) seine letzten beiden Begleiter stehen und gewinnt die 6. Etappe des Giro d'Italia in Ascoli Piceno. | sport | 1 | 2021-05-13 |
| 4478 | srf | Schweizer Sieg beim Giro - Paukenschlag beim Giro: Gino Mäder siegt auf der 6. Etappe. Der Fahrer vom Team Bahrain Victorious siegt in Ascoli Pieno. Zuvor hatte er sich von einer Ausreissergruppe abgesetzt. | sport | 1 | 2021-05-13 |
| 4479 | srf | Final im DFB-Pokal - Gegen Leipzig: Holen sich Bürki und der BVB den "Pott"?. Dortmund will mit einem Pokalsieg eine durchwachsene Saison «retten» – Leipzig den ersten Titel überhaupt holen. | sport | 1 | 2021-05-13 |
1895 rows × 5 columns
source_time = create_df_time(filtered_df, 'source')
source_time
| date_parsed | source | 2021-05-01 | 2021-05-02 | 2021-05-03 | 2021-05-04 | 2021-05-05 | 2021-05-06 | 2021-05-07 | 2021-05-08 | 2021-05-09 | ... | 2021-05-22 | 2021-05-23 | 2021-05-24 | 2021-05-25 | 2021-05-26 | 2021-05-27 | 2021-05-28 | 2021-05-29 | 2021-05-30 | 2021-05-31 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 20 minuten_neg | 34.38 | 57.58 | 39.39 | 42.5 | 45.00 | 38.64 | 29.03 | 24.24 | 41.18 | ... | 29.41 | 20.69 | 31.25 | 41.07 | 30.23 | 50.94 | 36.11 | 22.58 | 53.85 | 56.25 |
| 1 | 20 minuten_pos | 9.09 | 35.29 | 43.75 | 35.0 | 41.18 | 39.29 | 47.83 | 33.33 | 44.44 | ... | 23.81 | 36.84 | 11.76 | 32.35 | 44.44 | 33.33 | 28.00 | 38.10 | 37.50 | 23.53 |
| 2 | blick_neg | 56.25 | 36.36 | 42.42 | 42.5 | 40.00 | 50.00 | 61.29 | 51.52 | 35.29 | ... | 52.94 | 37.93 | 59.38 | 35.71 | 55.81 | 33.96 | 44.44 | 54.84 | 30.77 | 31.25 |
| 3 | blick_pos | 63.64 | 47.06 | 37.50 | 40.0 | 38.24 | 35.71 | 30.43 | 33.33 | 22.22 | ... | 23.81 | 36.84 | 64.71 | 44.12 | 33.33 | 36.67 | 32.00 | 47.62 | 37.50 | 23.53 |
| 4 | srf_neg | 9.38 | 6.06 | 18.18 | 15.0 | 15.00 | 11.36 | 9.68 | 24.24 | 23.53 | ... | 17.65 | 41.38 | 9.38 | 23.21 | 13.95 | 15.09 | 19.44 | 22.58 | 15.38 | 12.50 |
| 5 | srf_pos | 27.27 | 17.65 | 18.75 | 25.0 | 20.59 | 25.00 | 21.74 | 33.33 | 33.33 | ... | 52.38 | 26.32 | 23.53 | 23.53 | 22.22 | 30.00 | 40.00 | 14.29 | 25.00 | 52.94 |
6 rows × 32 columns
HTML('''<div class="flourish-embed flourish-chart" data-src="visualisation/6126744"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')
big_filtered_df = news_concat.loc[(news_concat['date_parsed'] >= '2021-05-01') & (news_concat['date_parsed']<= '2021-05-31')]
cat_time = create_df_time(big_filtered_df, 'category')
cat_time
| date_parsed | category | 2021-05-01 | 2021-05-02 | 2021-05-03 | 2021-05-04 | 2021-05-05 | 2021-05-06 | 2021-05-07 | 2021-05-08 | 2021-05-09 | ... | 2021-05-22 | 2021-05-23 | 2021-05-24 | 2021-05-25 | 2021-05-26 | 2021-05-27 | 2021-05-28 | 2021-05-29 | 2021-05-30 | 2021-05-31 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | business_neg | 3.70 | 4.55 | 11.84 | 8.14 | 10.11 | 8.33 | 4.23 | 0.00 | 4.11 | ... | 5.71 | 0.00 | 2.99 | 5.22 | 11.70 | 6.03 | 6.02 | 8.11 | 1.79 | 5.88 |
| 1 | business_pos | 4.35 | 4.76 | 8.00 | 16.07 | 21.31 | 14.29 | 3.45 | 6.12 | 2.50 | ... | 0.00 | 7.69 | 2.04 | 19.77 | 9.52 | 17.65 | 4.84 | 6.52 | 4.44 | 2.13 |
| 2 | entertainment_neg | 9.26 | 10.61 | 3.95 | 11.63 | 10.11 | 6.25 | 11.27 | 5.56 | 8.22 | ... | 8.57 | 7.14 | 10.45 | 12.17 | 14.89 | 11.21 | 14.46 | 9.46 | 10.71 | 11.76 |
| 3 | entertainment_pos | 4.35 | 7.14 | 8.00 | 10.71 | 21.31 | 17.46 | 10.34 | 2.04 | 5.00 | ... | 10.87 | 20.51 | 22.45 | 11.63 | 17.46 | 5.88 | 17.74 | 10.87 | 13.33 | 14.89 |
| 4 | health_neg | 9.26 | 9.09 | 22.37 | 15.12 | 13.48 | 17.71 | 15.49 | 26.39 | 20.55 | ... | 11.43 | 11.43 | 5.97 | 12.17 | 6.38 | 12.07 | 9.64 | 22.97 | 8.93 | 12.94 |
| 5 | health_pos | 6.52 | 7.14 | 12.00 | 10.71 | 3.28 | 15.87 | 8.62 | 24.49 | 17.50 | ... | 8.70 | 5.13 | 6.12 | 6.98 | 11.11 | 0.00 | 8.06 | 8.70 | 8.89 | 6.38 |
| 6 | nation_neg | 12.96 | 9.09 | 9.21 | 10.47 | 7.87 | 6.25 | 8.45 | 8.33 | 12.33 | ... | 11.43 | 4.29 | 11.94 | 7.83 | 6.38 | 10.34 | 13.25 | 5.41 | 16.07 | 9.41 |
| 7 | nation_pos | 4.35 | 0.00 | 6.00 | 0.00 | 3.28 | 4.76 | 10.34 | 4.08 | 2.50 | ... | 4.35 | 0.00 | 4.08 | 0.00 | 7.94 | 2.94 | 4.84 | 0.00 | 2.22 | 0.00 |
| 8 | science_neg | 14.81 | 15.15 | 14.47 | 13.95 | 16.85 | 17.71 | 18.31 | 5.56 | 12.33 | ... | 12.86 | 18.57 | 13.43 | 13.04 | 15.96 | 13.79 | 13.25 | 9.46 | 19.64 | 21.18 |
| 9 | science_pos | 19.57 | 14.29 | 24.00 | 17.86 | 13.11 | 11.11 | 20.69 | 16.33 | 10.00 | ... | 19.57 | 2.56 | 12.24 | 8.14 | 9.52 | 14.71 | 17.74 | 13.04 | 11.11 | 12.77 |
| 10 | sport_neg | 12.96 | 22.73 | 7.89 | 12.79 | 12.36 | 15.62 | 12.68 | 16.67 | 15.07 | ... | 12.86 | 5.71 | 22.39 | 8.70 | 13.83 | 10.34 | 10.84 | 10.81 | 14.29 | 8.24 |
| 11 | sport_pos | 19.57 | 30.95 | 8.00 | 16.07 | 14.75 | 20.63 | 10.34 | 16.33 | 22.50 | ... | 28.26 | 38.46 | 24.49 | 18.60 | 12.70 | 19.12 | 14.52 | 19.57 | 24.44 | 25.53 |
| 12 | technology_neg | 3.70 | 6.06 | 5.26 | 4.65 | 6.74 | 3.12 | 7.04 | 6.94 | 2.74 | ... | 10.00 | 11.43 | 2.99 | 6.09 | 3.19 | 0.86 | 7.23 | 8.11 | 3.57 | 5.88 |
| 13 | technology_pos | 15.22 | 9.52 | 10.00 | 7.14 | 4.92 | 1.59 | 8.62 | 4.08 | 10.00 | ... | 4.35 | 5.13 | 14.29 | 10.47 | 9.52 | 7.35 | 6.45 | 10.87 | 6.67 | 10.64 |
| 14 | world_neg | 33.33 | 22.73 | 25.00 | 23.26 | 22.47 | 25.00 | 22.54 | 30.56 | 24.66 | ... | 27.14 | 41.43 | 29.85 | 34.78 | 27.66 | 35.34 | 25.30 | 25.68 | 25.00 | 24.71 |
| 15 | world_pos | 26.09 | 26.19 | 24.00 | 21.43 | 18.03 | 14.29 | 27.59 | 26.53 | 30.00 | ... | 23.91 | 20.51 | 14.29 | 24.42 | 22.22 | 32.35 | 25.81 | 30.43 | 28.89 | 27.66 |
16 rows × 32 columns
HTML('''<div class="flourish-embed flourish-chart" data-src="visualisation/6167385"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')
cat_time.to_csv('cat_time.csv')
source_time.to_csv('source_time.csv')